feat(compare-rendering): add Word↔SuperDoc layout comparison by caio-pizzol · Pull Request #2891 · superdoc-dev/superdoc

caio-pizzol · 2026-04-22T00:23:39Z

A dev tool that compares how Word and SuperDoc render the same .docx. Runs via pnpm compare-rendering. Instead of eyeballing screenshots, it prints a list of differences — text, pagination, structure — and points at the SuperDoc code to look at.

Scope is small on purpose: paragraph-only docs. Tables, images, comments, and tracked changes get skipped with a clear reason, so we never fake a "looks fine."

Talks to Word through the word-api REST endpoint; talks to SuperDoc through the existing pnpm layout:export-one.
Caches Word output so re-runs don't hit the VM again.
13 unit tests on the diff logic. Full 75-doc corpus: 20 match, 40 differ, 13 skipped.

Roadmap is in the README. Next is baselines (so an agent can tell what its change affected) and a screenshot judge for things schema diff can't see.

Review: src/word.ts polling, src/differ.ts category routing, src/extract-layout.ps1 short-circuit. Skip scripts/batch.ts.

Diffs resolved paragraph state between Word (via word-mcp run_powershell on a Windows VM) and SuperDoc (via layout:export-one) for paragraph-only docx files. Emits typed Finding[] with category/severity/specRef/codeArea hints so an agent consumer can route fixes to the right SuperDoc module. Unsupported features (tables, inline/floating shapes, tracked changes, comments) short-circuit with a skipped finding rather than producing a misleading diff. Word extraction is cached by sha256(docx) + sha256(extract-layout.ps1) so PS edits bust the cache automatically. Scope: paragraph-only flow. Categories emitted: text, pagination, structure, unsupported. Style/indent/color/numbering deferred to M2.

Move from the MCP JSON-RPC envelope to word-api's REST `/v1/executions` + `/v1/jobs/:id` polling. Smaller, clearer error taxonomy, and aligned with the direction the API is taking (async-first). - word.ts shrinks ~30 lines — drop SSE parser, content-type dispatch, regex JSON fallback. Plain JSON envelope all the way. - Poll interval 500ms with `timeoutSeconds * 1000 + 30s` outer deadline so a stuck job can't pin a batch forever. - Cache key and short-circuit behavior unchanged. - WORD_API_URL / WORD_API_TOKEN replace WORD_MCP_URL / WORD_MCP_TOKEN. Also ship scripts/batch.ts — the ad-hoc corpus sweep we used to pressure-test the refactor, kept as a stepping stone to M2's proper `--input-dir`. README milestones revised after M1 corpus-batch insights: M2 is now baseline + delta reporting (agent-usable signal), M3 the LLM screenshot judge (catches false negatives schema diff cannot see, e.g. border-style rendering on sd-1741), with resolved style fields and tables pushed to M4/M5.

harbournick · 2026-04-22T00:27:26Z

@caio-pizzol sounds interesting. let's just make sure we're not both doing the same work? I'm doing a lot around this in labs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 370bb8823e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

codecov-commenter · 2026-04-22T00:29:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…ia fileURLToPath Two small fixes from PR review: - parseExtractionOutput searched for JSON_END from offset 0, so a document whose paragraph text contained the literal string "JSON_END" could truncate the payload before JSON.parse. Search from past JSON_BEGIN instead. - REPO_ROOT in superdoc.ts was using URL.pathname, which yields /C:/… on Windows and keeps URL-encoded special chars. Use fileURLToPath like we already do in cache.ts / word.ts.

caio-pizzol · 2026-04-22T09:42:21Z

@caio-pizzol sounds interesting. let's just make sure we're not both doing the same work? I'm doing a lot around this in labs.

Yep! I know you’re :) - was trying something with word-mcp that was already in place. Interesting findings, it will all be under the same tool in labs eventually

The tool was producing absolute findings, which is fine for humans reading a report but useless for an agent trying to tell whether its change helped or hurt. An agent running against 367 pre-existing findings can't distinguish "my edit caused these" from "SuperDoc was already like this." Adds a stable `fingerprint` per finding (`category:paragraphOrdinal`, per doc), snapshot/replay via `--save-baseline` and `--baseline`, and a delta report that names only resolved / new findings. Exit 2 on any new finding makes it a CI-gate. Folds the ad-hoc `scripts/batch.ts` into `--input-dir` so the CLI is the single entry point.

caio-pizzol added 2 commits April 21, 2026 21:22

superdoc-bot Bot added the review: quick label Apr 22, 2026

caio-pizzol marked this pull request as draft April 22, 2026 00:24

caio-pizzol changed the title ~~feat(compare-rendering): add Word↔SuperDoc paragraph-diff CLI (M1)~~ feat(compare-rendering): add Word↔SuperDoc layout comparison Apr 22, 2026

chatgpt-codex-connector Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread devtools/compare-rendering/src/word.ts Outdated

Comment thread devtools/compare-rendering/src/superdoc.ts Outdated

caio-pizzol closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(compare-rendering): add Word↔SuperDoc layout comparison#2891

feat(compare-rendering): add Word↔SuperDoc layout comparison#2891
caio-pizzol wants to merge 4 commits into
mainfrom
caio/compare-rendering

caio-pizzol commented Apr 22, 2026 •

edited

Loading

Uh oh!

harbournick commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 22, 2026

Uh oh!

caio-pizzol commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

caio-pizzol commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harbournick commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 22, 2026

Codecov Report

Uh oh!

caio-pizzol commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

caio-pizzol commented Apr 22, 2026 •

edited

Loading